Some Tests Of An Unsupervised Model Of Language Acquisition

نویسندگان

  • Bo Pedersen
  • Shimon Edelman
  • Zach Solan
  • David Horn
چکیده

We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus of transcribed child-directed speech (CHILDES). The model’s ability to process novel inputs makes it capable of taking various standard tests of English that rely on forced-choice judgment and on magnitude estimation of linguistic acceptability. We report encouraging results from several such tests, and discuss the limitations revealed by other tests in our present method of dealing with novel stimuli. 1 The empirical problem of language acquisition The largely unsupervised, amazingly fast and almost invariably successful learning stint that is language acquisition by children has long been the envy of computer scientists (Bod, 1998; Clark, 2001; Roberts and Atwell, 2002) and a daunting enigma for linguists (Chomsky, 1986; Elman et al., 1996). Computational models of language acquisition or “ grammar induction” are usually divided into two categories, depending on whether they subscribe to the classical generative theory of syntax, or invoke “ general-purpose” statistical learning mechanisms. We believe that polarization between classical and statistical approaches to syntax hampers the integration of the stronger aspects of each method into a common powerful framework. On the one hand, the statistical approach is geared to take advantage of the considerable progress made to date in the areas of distributed representation and probabilistic learning, yet generic “ connectionist” architectures are ill-suited to the abstraction and processing of symbolic information. On the other hand, classical rule-based systems excel in just those tasks, yet are brittle and difficult to train. We are developing an approach to the acquisition of distributional information from raw input (e.g., transcribed speech corpora) that also supports the distillation of structural regularities comparable to those captured by Context Sensitive Grammars out of the accrued statistical knowledge. In thinking about such regularities, we adopt Langacker’s notion of grammar as “ simply an inventory of linguistic units” ((Langacker, 1987), p.63). To detect potentially useful units, we identify and process partially redundant sentences that share the same word sequences. We note that the detection of paradigmatic variation within a slot in a set of otherwise identical aligned sequences (syntagms) is the basis for the classical distributional theory of language (Harris, 1954), as well as for some modern work (van Zaanen, 2000). Likewise, the pattern — the syntagm and the equivalence class of complementary-distribution symbols that may appear in its open slot — is the main representational building block of our system, ADIOS (for Automatic DIstillation Of Structure). Our goal in the present short paper is to illustrate some of the capabilities of the representations learned by our method vis a vis standard tests used by developmental psychologists, by secondlanguage instructors, and by linguists. Thus, the main computational principles behind the ADIOS model are outlined here only briefl y. The algorithmic details of our approach and accounts of its learning from CHILDES corpora appear elsewhere (Solan et al., 2003a; Solan et al., 2003b; Solan et al., 2004; Edelman et al., 2004). 2 The principles behind the ADIOS algorithm The representational power of ADIOS and its capacity for unsupervised learning rest on three principles: (1) probabilistic inference of pattern significance, (2) context-sensitive generalization, and (3) recursive construction of complex patterns. Each of these is described briefl y below. 78 P84 that P58 P63 E63 E64 P48 E64 Beth | Cindy | George | Jim | Joe | Pam | P49 | P51 P48 , doesn't it P51 the E50 P49 a E50 E50 bird | cat | cow | dog | horse | rabbit P61 who E62 E62 adores | loves | scolds | worships E53 Beth | Cindy | George | Jim | Joe | Pam E85 annoyes | bothers | disturbes | worries P58 E60 E64 E60 flies | jumps | laughs th a t B e th

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audiovisual Programs As Sources Of Language Input: An Overview

Audiovisual devices such as satellite and conventional televisions can offer easy access to authentic programs which are considered to be a rich source of language input for SLA (Second Language Acquisition). The immediacy of various audiovisual programs ensures that language learners’ exposure is up-to-date and embedded in the real world of native speakers. In the same line, in the present pap...

متن کامل

Audiovisual Programs As Sources Of Language Input: An Overview

Audiovisual devices such as satellite and conventional televisions can offer easy access to authentic programs which are considered to be a rich source of language input for SLA (Second Language Acquisition). The immediacy of various audiovisual programs ensures that language learners’ exposure is up-to-date and embedded in the real world of native speakers. In the same line, in the present pap...

متن کامل

Language development and acquisition in children

Language acquisition is a natural developmental process and is unique to Homo sapiens in which a child acquiring his or her mother tongue as a first language.  The simplest theory of language development is that children learn language by imitating adult language. A second possibility is that children acquire language through conditioning. Noam Chomsky put forward innateness hypothesis. Piaget ...

متن کامل

The role of negotiation and TA in Iranians’ second language acquisition

In this study, it is attempted to survey some intervening factors leading L2 Iranian learners’ not to be successful as well, and then seeks some of the features that might be applicable to open new windows into L2 learners in Iran. Also it concerns some aspects of language learning, which have received poor attention from both pedagogical and non-pedagogical areas. This article examined some so...

متن کامل

Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language

Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004